Project Outline

This report is Part 1 in a five part series in which we are exploring and analyzing ocean buoy data collected from NOAA maintained National Data Buoy Center (NDBC) stations. In this report, we will be exploring and comparing predictions and recorded observations of water column movement, known as ocean current, at the West entrance to the Strait of Juan de Fuca near Neah Bay, Washington. In Part 2 we will take a look at meteorological (wind and wave) data from the Neah Bay Buoy and examine the potential for significant meteorological events to introduce noise in ocean current observations. In Part 3 we will introduce meteorological data for another location, NDBC Station 46088 (New Dungeness Buoy), and compare trends in wave height, period, and direction with those of the Neah Bay Buoy. We will attempt to highlight the relationship between swell events at the Neah Bay Buoy and swell events at the New Dungeness Buoy. In Part 4 we will walk through considerations and processes involved in training and testing a supervised ML model to predict the class of wave which might occur at the New Dungeness Buoy given conditions at the Neah Bay Buoy. In Part 5 we will put our final classifier model in production by supplying forecasted conditions for the Neah Bay Station and determining the predicted class of wave observed at the New Dungeness Station.

More detailed information regarding the NDBC, and the locations of buoys they maintain, can be found on their website.

Executive Summary: Part 1

For Part 1, the objective is to become familiar with ocean current predictions and how they compare with recorded observations at the Neah Bay Buoy (NDBC Station 46087). We begin with a basic visualization of daily, weekly, and monthly predictions. Then we progress by overlaying ocean current observations.

We notice instances where ocean current observations follow predictions almost identically, and other instances where observations seem erratic. We conclude the visual exploration with a series of yearly plots of ocean current predictions and observations.

Data

The data used originated from two separate sources: the observations were recorded by instrumentation attached to the NDBC Station id# 46087, while the predictions data were sourced from this website https://tides.mobilegeographics.com/locations/7867.html.

The recorded observation data was nicely formatted and available for download in yearly ‘.txt’ files from the NDBC website https://www.ndbc.noaa.gov/station_history.php?station=46087. I compiled these available observations into a single dataset ranging from year 2011 through 2019. After cleaning and wrangling, here’s a summary table and quick glimpse of the observation data:

Summary of Ocean Current Observation Data

##        id           date_time                        cm_s         
##  46087_o:120069   Min.   :2011-04-13 01:00:00   Min.   :-300.000  
##                   1st Qu.:2014-02-12 21:30:00   1st Qu.: -28.800  
##                   Median :2015-11-03 20:00:00   Median :  10.000  
##                   Mean   :2015-11-18 13:01:23   Mean   :   7.005  
##                   3rd Qu.:2018-01-08 09:00:00   3rd Qu.:  34.300  
##                   Max.   :2019-12-31 23:30:00   Max.   : 300.000  
##       degT       dir           depth    
##  Min.   :  0.0   E:21472   Min.   :1.6  
##  1st Qu.:108.0   N:31584   1st Qu.:1.6  
##  Median :246.0   S:18502   Median :1.6  
##  Mean   :206.7   W:48511   Mean   :1.6  
##  3rd Qu.:296.0             3rd Qu.:1.6  
##  Max.   :360.0             Max.   :1.6
## Rows: 120,069
## Columns: 6
## $ id        <fct> 46087_o, 46087_o, 46087_o, 46087_o, 46087_o, 46087_o, 460...
## $ date_time <dttm> 2011-04-13 01:00:00, 2011-04-13 01:30:00, 2011-04-13 02:...
## $ cm_s      <dbl> 9.2, 20.0, 21.9, 33.1, 39.8, 55.7, 55.1, 57.2, 65.0, 55.9...
## $ degT      <int> 110, 109, 120, 120, 129, 124, 133, 141, 106, 90, 91, 79, ...
## $ dir       <fct> E, E, E, E, E, E, E, S, E, E, E, E, E, E, E, E, E, E, E, ...
## $ depth     <dbl> 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1....
Summary Statistics
Direction Average Deg True
E 86.99725
N 194.51627
S 179.60988
W 277.96990

The fields are relatively easy to understand, but we will walk through a denifition and description for each:

  • id refers to the NDBC station id, 46087. the appendix ’_o’ indicates the entries are from the observations dataset.
  • date_time is the year, month, day, and time of the recorded observation. Observations are recorded every 30 minutes continuously and stored in GMT/UTC timezone. There are a fair amount of missing recordings in the data.
  • cm_s is the speed of the water column recorded in centimeters per second. For the purposes of this exploration positive entries indicate flooding currents, or Easterly water column movement, and negative entries indicate ebbing currents, or Westerly water column movement.
  • degT is the direction in degrees true of the movement of the water column at the time of the observation.
  • dir is a feature I created using the data from degT, it denotes the direction of the water column at the time of the observation. N indicates readings between 315 and 45, E indicates readings between and including 45 and 135, S indicates readings between 135 and 225, and W indicates readings between and including 225 and 315.
  • depth is the depth of the recorded observation. The instrument is set at 1.6 meters for this station.

Historical currents prediction tables are not readily available. Even NOAA only supplies current predictions going back two years from the present date, see NOAA tides and currents website here: https://tidesandcurrents.noaa.gov/stationhome.html?id=9443090. To acquire prediction data going back as far as 2004 I had to source it from table objects on the tides.mobilegeographics website using Microsoft Excel’s Power Query feature. Since a nicely formatted text file was not apparently available, this process was arduous as it was necessary to transform the query to allow for proper rendering of the data. In addition, I was only able to access one month at a time for each year from 2004 to 2021. After pulling the prediction data from the internet through Excel, I compiled these predictions in R and performed fine-tuned cleaning and wrangling to create proper data types, clean up text, create dates and times with accurate timezones, and extrapolate astronomical data like moon phase for all prediction dates. Here is a basic summary table and quick glimpse at the predictions data:

Summary of Ocean Current Prediction Data

##        id                  MoonPhase       Date_Time                  
##  46087_p:47239   Waxing Gibbous :10248   Min.   :2004-01-01 10:32:00  
##                  Waning Crescent:10218   1st Qu.:2008-08-24 01:29:00  
##                  Waning Gibbous :10198   Median :2013-03-01 18:10:00  
##                  Waxing Crescent:10173   Mean   :2013-01-30 06:48:43  
##                  Full Moon      : 1689   3rd Qu.:2017-07-06 19:28:00  
##                  (Other)        : 4711   Max.   :2022-01-01 05:45:00  
##                  NA's           :    2                                
##    Event            cm_s              degT          dir       
##  Ebb  :14471   Min.   :-190.33   Min.   :115.0   E    :10923  
##  Flood:10923   1st Qu.: -51.44   1st Qu.:115.0   Slack:21845  
##  Slack:21845   Median :   0.00   Median :290.0   W    :14471  
##                Mean   : -13.76   Mean   :214.7                
##                3rd Qu.:   0.00   3rd Qu.:290.0                
##                Max.   : 180.04   Max.   :290.0                
##                                  NA's   :21845
## Rows: 47,239
## Columns: 7
## $ id        <fct> 46087_p, 46087_p, 46087_p, 46087_p, 46087_p, 46087_p, 460...
## $ MoonPhase <fct> Waxing Gibbous, Waxing Gibbous, Waxing Gibbous, Waxing Gi...
## $ Date_Time <dttm> 2004-01-01 10:32:00, 2004-01-01 12:42:00, 2004-01-01 15:...
## $ Event     <fct> Slack, Ebb, Slack, Flood, Slack, Ebb, Slack, Flood, Slack...
## $ cm_s      <dbl> 0.000, -30.864, 0.000, 36.008, 0.000, -108.024, 0.000, 66...
## $ degT      <int> NA, 290, NA, 115, NA, 290, NA, 115, NA, 290, NA, 115, NA,...
## $ dir       <fct> Slack, W, Slack, E, Slack, W, Slack, E, Slack, W, Slack, ...

Again many of the fields are straight forward, but we will walk through a definition and description for each:

  • id refers to the location. The latitude and longitude for the NDBC station id 46087 were used to generate these prediction charts. The appendix ’_p’ indicates the entries are from the predictions dataset.
  • MoonPhase indicates the phase of the moon for the given date. ‘Full Moon’ is when the moon is fully visible, ‘New Moon’ is when the moon is not visible at all. Waning indicates that the amount of visible surface on the moon is shrinking. Waxing indicates that the amount of visible surface on the moon is growing. Crescent indicates that there is less than half of the moon visible. Gibbous indicates that there is more than half of the moon visible.
  • date_time is the year, month, day, and time of the predicted event. The times were acquired in PST/PDT and were translated to the GMT/UTC timezone. Much effort was taken to verify an accurate translation.
  • Event indicates the type of event predicted to occur at the indicated date and time. ‘Slack’ refers to no water movement. ‘Flood’ refers to the maximum flood, or maximum Easterly water column movement. ‘Ebb’ refers to the maximum ebb, or maximum Westerly water column movement.
  • cm_s is the speed of the water column recorded in centimeters per second. For the purposes of this exploration positive entries indicate flooding currents, or Easterly water column movement, and negative entries indicate ebbing currents, or Westerly water column movement. The predictions data was originally pesented in nautical miles per hour, and cm/s were calculated using a factor of 51.444cms/knot.
  • degT is the direction in degrees true of the movement of the water column at the time of the observation. The data source provided 115 as the mean direction for Flood events, and 290 as the mean direction for Ebb events. Slack events were recorded with ‘NA’ for the degT field.
  • dir is a feature I created using the data from degT, it denotes the direction of the water column at the time of the observation. ‘E’ indicates readings of Flood events, ‘W’ indicates readings of Ebb events, and ‘Slack’ indicates Slack events.

Explore with Visualizations

First, let’s explore the prediction data to get a better understanding of how it is organized. Here we see predictions for a single day, March 19th, 2014:

Notice there are positive and negative speeds. A mark in the positive region indiactes a peak flood event, or maximum East flowing current, while a mark in the negative region indicates a peak ebb event, or maximum West flowing current. Predicted slack events are indicated with a mark at zero.

Now let’s zoom out for a weekly and monthly view of March 2014 (note that Slack Events have been removed):

Alright, now lets overlay data for the observed currents:

It appears that at times the observations follow the predictions well, while at other times the observations are way outside of the prediction range skewed in the positive direction. Also, what is happening around March 10th? Why are some of the flood events predicted to be negative? They are not, there are simply three ebb events those days.

Let’s take a closer look around March 10th, 2014:

Now let’s look at each month of the year for 2014:

My initial observation is that Ebb Events are generally predicted to be stronger than Flood Events. Ebb Events are regularly predicted to be in the -100 cm/s range, while Flood Events are regularly in the 50 cm/s range.

The June, July, and August observations are almost exactly aligned with the predictions, while the second half of September through December all show observations which are much smaller than predicted and differently organized. Perhaps seasonal storm activity have an affect on the instrument’s readings, and an attempt was made to correct for this interference leading to these ‘supressed’ observation values. I can imagine a 15ft+ swell introducing some variation in the current reading as the buoy is being lifted and dropped through the peak and trough of the swell. The NDBC’s website does not describe the method by which it determines the reading at a given time (whether it is an average over a period, whether they attempt to correct for strong swell or wind affects, etc.), but more information regarding their data descriptions and measurement techniques can be found on this webpage: https://www.ndbc.noaa.gov/measdes.shtml. It would also be relevant to compare these dates with swell data, which we will do in part 2 of this project.

Next, let’s look at yearly sequences to see if any seasonal trends become apparent. This will also highlight our missing observation data. Here are graphs for years 2011 to 2019:

Very cool, there is a lot going on here. Late 2018 and most of the 2019 data look to be noisy. I’m not sure why it appears to be so different than the previous years’ data. Maybe an insturment malfunctioned, barnacle growth or seaweed got caught in the instrument, or maintenance was performed which altered the readings, or perhaps there was an issue in data transmission and the values were encoded or un-encoded inaccurately.

I notice a couple periods in the timeline where the observations seem to be compressed, during the Winter of 2014 through Spring of 2015 and also from July to October of 2016. In part 2, we will explore wave data and compare the timelines of these trends to see if there are any patterns which align along these periods.

Other patterns I took note of include the presence of periods where the observed flood seems to be stronger in general than the observed ebb, followed by periods where the ebb seems to be stronger than the flood. For example, look at the graph of 2014. Moving sequentially starting at the first of the year, there is a ‘spike’ in the negative direction followed by a ‘spike’ in the positive direction. This pattern of ‘offset spikes’ repeats itself with some ambiguity through June 2014. Here is a closer look:

Summary, Next Steps, and the Bigger Picture

As we have seen observations of ocean currents recorded at the NDBC Station 46087 are erratic. Sometimes they align almost identically with predicted currents while at other times observations are off the charts, or severly suppressed. I believe other meteorological factors come into play and have an affect on the observed ocean current.

In part 2 of this project we will explore and visualize characteristics of features such as wave, wind, and atmospheric pressure from recorded observations at the NDBC Station 46087. In addition we will compare timeseries of these features with noted timeseries of interest in part 1. In part 3 of this project we will dive into bouy data from the NDBC Station 46088, also known as the New Dungeness Buoy. The intention will be to compare data from Station 46087 with data from 46088 to determine a list of dates where swell was recorded passing through the Strait. It will be necessary to set thresholds for wind speed to filter out strong North West wind events which cause local windswell, and I’m sure many more challenges and considerations will present themselves.

My goal in pursuing this project Exploring Ocean Buoy Data, is to validate data and gain a better understanding of relationships among features in an attempt to train and develop a supervised machine learning model to predict the class of swell in the Strait of Juan de Fuca. This will be a complex and multifaceted task, with ample consideration required before sound model development can begin. My intentions in pursuing this endeavour are to produce a model which will be deployable by providing a set of forecasted conditions at the Neah Bay Buoy (swell size/period/direction, wind speed/direction, tides/current predictions, date, etc) and producing a prediction for the class of wave which will occur at the New Dungeness Buoy.